Online Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”

نویسندگان

Peter Frazier

Warren Powell

Savas Dayanik

چکیده

As discussed in Section 3 of the main paper, the KG policy posseses several optimality and convergence properties. First, it is optimal by construction when N = 1 (Remark 1). Second, the suboptimality gap between the values of the KG and the optimal policies narrows to 0 as N →∞ (Theorem 4). This is a convergence result, since it shows that when sampling under the KG policy we are guaranteed to eventually discover the alternative that is truly best. Third, the suboptimality gap is bounded for N between these two extremes (Theorem 5). Here, we discuss and prove these latter two results, discussing the convergence result in Section A.2, and the general bound on suboptimality in Section A.3. These results extend those proved in Frazier et al. (2008) for independent normal priors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The knowledge gradient algorithm for online learning

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. The resulting decision rule easily extends to a variety of settings, including the case where our prior beliefs about the rewards are correlated. Experiments show that the KG policy performs competitively against other learning policies in diverse situations. In the ca...

متن کامل

The Knowledge-Gradient Policy for Correlated Normal Beliefs

We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives’ mean values, algorithms may utilize this dependence to perform efficiently even when the number of alternatives is very large. We...

متن کامل

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the opti...

متن کامل

The effect of language complexity and group size on knowledge construction: Implications for online learning

This study investigated the effect of language complexity and group size on knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the number of unique words by total words. It refers to the lexical variation. The results showed that...

متن کامل

Optimal learning for sequential sampling with non-parametric beliefs

We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weigh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Online Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”

نویسندگان

چکیده

منابع مشابه

The knowledge gradient algorithm for online learning

The Knowledge-Gradient Policy for Correlated Normal Beliefs

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

The effect of language complexity and group size on knowledge construction: Implications for online learning

Optimal learning for sequential sampling with non-parametric beliefs

عنوان ژورنال:

اشتراک گذاری